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Abstract 

I propose a method to fit the probability distribution function (hereafter PDF) of the large 
scale density field p, motivated by a Lagrangian version of the continuity equation. It consists in 
applying the Edgeworth expansion to the quantity $ = logp— (logp). The method is tested on 
the matter particle distribution in two cold dark matter .AT-body simulations of different physical 
sizes to cover a large dynamic range. It is seen to be very efficient, even in the non-linear regime, 
and may thus be used as an analytical tool to study the effect on the PDF of the transition 
between the weakly non-linear regime and the highly non-linear regime. 

Subject headings: cosmology: large-scale structure of universe - galaxies: clustering 
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1 Introduction 



It is generally believed that the large scale structures of the observed galaxy distribution have 
grown under gravitational instability from small initial fluctuations. Many statistical tools exist to 
compare models to the observations. I consider here the probability distribution function P(p,l) 
(PDF) of the density field smoothed with a filter window of typical size I, which can be a spherical 
(cubical) cell of radius (size) I or a gaussian of half-width I. Possible discreteness effects will be 
neglected and the initial fluctuations will be supposed gaussian. 

In the weakly non-linear regime, e.g., in the case the fluctuations are small, one can use pertur- 
bation theory to analytically predict the behaviour of the PDF (Bernardeau 1992, Kofman et al. 
1993, hereafter KBGND, Juszkiewicz et al. 1993, hereafter JWACB). For example, JWACB have 
suggested to use the Edgeworth expansion to fit the shape of the PDF in the weakly non-linear 
regime and to evaluate how it deviates from the gaussian limit. Given the variance o 2 {€) = {6 2 ) of 
the smoothed density contrast 8 = pj ' [p) — 1, this method consists in expanding the function P(p,l) 
as a series of terms in powers of a relatively to a gaussian of same variance and same average. But 
this method is practically valid only for a small or moderate variance a 2 Sa 1. 

The non-linear case is more complicated and numerical simulations are generally used to study 
it. The PDF, as measured in JV-body simulations as a function of p in the non-linear regime, 
is strongly non gaussian. It is seen to present a power-law behaviour surrounded by two expo- 
nential cut-offs (Bouchet, Schaeffer & Davis 1991, Bouchet & Hernquist 1993). Such a behaviour 
was predicted by Balian & Schaeffer (1988, 1989, see also Fry 1985), but their calculation needs 
supplementary assumptions on the correlation hierarchy, that are not proved to be necessarily valid. 

Fortunately, the continuity equation can be used as a filter to the non-linear regime. We can 
write the density field as 



In equation (1), a is the expansion factor, x is the comoving coordinate and v the peculiar velocity 
[v/a = dx/di]. The integral is done in a Lagrangian way, e.g., by following the particle trajectories. 
Such an expression is thus valid only before shell crossing. In the the weakly non-linear regime 
and if standard linear theory applies, $ is proportional to V x v and is thus gaussian. If, instead of 
expanding it at first order, we keep the exponential of equation (1) to insure the positivity of the 
density, we see that p is rather lognormal than gaussian, as already pointed out by Coles & Jones 



The validity of this simple reasoning is confirmed by recent measurements on the observed three 
dimensional galaxy catalogs (Hamilton 1985, Bouchet et al. 1993) as well as on cold dark matter 
JV-body simulations (KBGND), that indicate that the lognormal distribution (hereafter LNDF) is 
a very good fit to the data. Note that Hubble already used the LNDF in 1934 to fit the PDF 
measured on the projected galaxy distribution. 

This suggests we apply the Edgeworth expansion to the variable $ rather than the density itself 
and use equation (1) as a non-linear filter to insure the positivity of p. This letter is thus organized 
as follows. In Sect. 2, I recall the Edgeworth expansion (up to third order) and compute a fit to 
the PDF of the smoothed density field, applying this approximation to the measured variable $, 
or more exactly, to the measured variable $ = logp — (log/)) where p is the smoothed density 
field. I will relate the moments of the variable p to those of the variable $. In Sect. 3, I test the 
performance of the approximation by measuring the PDF on the distribution of matter particles 
from two cold dark matter (hereafter CDM) JV-body simulations. Section 4 is the conclusion. 




trajectory 




(1) 



(1991). 
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2 The "skewed" lognormal approximation 

In the following, I will not exactly consider the quantity $ given by equation (1), but, for the sake 
of simplicity, the quantity of zero average $ = log/) — (log/)), where p is the smoothed density field. 
I will also remove the scale dependence of the PDF, although it is implicitly assumed. 

Let <t| = ($ 2 ) be the variance of $. At third order in <r$, the Edgeworth approximation is 
written (e.g., JWACB) 



G{y)dv, (2) 



where v = $/<r$, H m (x) = c? m [exp( — x 2 /2)]/ dx m is the Hermite polynomial of degree m, G{y) a 
gaussian of average zero and variance unity. The quantities T^l) and T^l) are the renormalized 
skewness and the renormalized kurtosis of the field $: 



The average ($"5) is defined by ($"5) = / $^P($)c?$. The writing (2) insures that the variance, 
the skewness and the kurtosis of the PDF Pe,3(&) are also T3 and T4. At first order in <r$, 
the Edgeworth approximation Pe,i would simply give a gaussian. The second order correction 
in <7$ comes down to taking into account the fact that the PDF is asymmetric, and thus a term 
proportional to T^a^H^v) appears in Pe,2- The third order takes into account the flattening of 
the PDF. We can see that the function PE,k(&) is not really a distribution function, since it is 
not generally positive definite and thus has to be considered only as an approximation valid at k th ~ 
order in a . 

The PDF of the density field can now be approximated by the function 

P k (S)dS = P k (p)dp = -P E , k [log(p/p k )}dp, (4) 
P 

where the factor 

Pk = j P E , k (\ogx)dx (5) 

insures mass conservation, e.g., {p)k = J pPk(p)dp = (p). 

The function P\{p) = (V^wpa^) -1 exp{-[log( / o/^o 1 )] 2 /c7|} is nothing but a LNDF. Since the 
function P k (p) [or equivalently P k (6)] involves some corrections to the LNDF when k > 2, I will 
call it a "skewed" LNDF, hereafter SLNDFAi, k = 1,2,3. 

The moment of order Q of the SLNDFAi can be analytically evaluated. Let us write it as 
{p®)k = J P®Pk(p)dp. We easily have 

{p Q )k = {p) Q fk{Q)[fk{l)]- Q exp [Q(Q - l)a|/2] , (6) 

with 



MQ) = 1, (7) 
1 

-( 
6 



/ 2 (Q) = 1 + ^Q 3 T 3 4, (8) 

f 3 (Q) = 1 + ^q 3 t 3 4 + ^-q 4 t 4 4 + 4<3 6t I4- (9) 
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Table 1: Values of statistical estimators in various approximations for the CDM case 



£(Mpc) 


a 2 




<T$T 3 


*|T 4 


S 3 


CjCOT 


^wea 


53,1 


5*3,2 


5*3,3 


2.5 


9.3 


1.4 


0.55 


0.45 


5.0 


7.2 




6.0 


6.1 


7.5 


4.0 


3.6 


1.2 


0.40 


0.25 


4.2 


6.3 




5.2 


5.7 


7.4 


6.4 


1.5 


0.8 


0.25 


-0.06 


3.6 


5.1 




4.2 


4.9 


5.6 


12 


0.6 


0.6 


-0.15 


0.06 


4.2 


4.2 


3.4 


3.8 


2.8 


3.6 


23 


0.2 


0.2 


-0.06 


-0.03 


3.2 


3.2 


3.0 


3.2 


3.0 


2.9 


46 


0.04 


0.04 


-0.20 


-0.15 


2.0 


n.a. 


2.6 


3.0 


2.1 


2.0 



Let a 2 = {6 2 ) be the variance of the density contrast, and S3 and S4 the skewness and kurtosis 
of 6, defined in a similar way as T3 and T4 (e.g., eq. [3]). Let a\ = {S 2 )k, 5^3, fc and £4^ be the 
variance, the skewness and the kurtosis of the SLNDF&. In general, we will have a 2 ^ cr|, Si 7^ Si t k, 
i = 3,4 except in the weakly non-linear limit (but it depends on the quantity considered and on k, 
see Appendix). Only the averages of Pk{p) an d P(p) are equal. Note that the PDF Pk{p) is n °t 
used in the usual way to fit the PDF P(p), e.g., by insuring that the low-order moments of the 
density distribution are equal for both the fit and the real PDF. It is rather used in such a way 
that the low-order moments of the field $ are equal for both the approximation Pe,Ic{$) and the 
real PDF P($) . 

The exponential function in equation (6) makes a 2 generally much larger than <r| in the non- 
linear regime. For example, in the lognormal case {k = 1), we have a 2 = exp(<r|) and hence 
a 2 £ 2.7 when <r| £ 1. The Edgeworth expansion was seen by JWACB to be practically valid 
for <r| Sa 1. We can thus believe that the SLDNFA: should be a good fit of P(p) in the non-linear 
regime, at least for moderate values of a 2 , of order a few unities. 

3 Application: measurement of the PDF in CDM samples 

Bouchet et al. (1991) and Colombi et al. (1994, hereafter CBS) have measured the PDF for cubical 
cells of size I in the distribution of matter particles coming from two JV-body CDM simulations 
generated with a P 3 M code and involving 262 144 matter particles. The first simulation (Davis 
& Efstathiou 1988), hereafter CDM1, has a moderate physical size L\ = 64 Mpc (with Ho = 50 
km/s/Mpc), which permits us to measure the PDF in the non-linear regime ({S 2 ) iS 1). The 
second simulation (Frenk et al. 1990), hereafter CDM2, has a large size L2 = 360 Mpc and is thus 
appropriate to measure the PDF in the weakly non-linear regime ({S 2 ) ^ 1). The two simulations 
were both stopped when the variance of the distribution in a sphere of radius 16 Mpc was 1/b 2 with 
b ~ 2.4 so they are two different realizations of the same underlying statistics. The first column of 
table 1 gives the scale in Mpc at which the PDF has been measured. 

Since the samples considered here are sets of points, their discrete nature has to be taken into 
account. Let P(p) be the measured PDF and P(p) the real PDF [e.g., the continuum limit of P(p)]. 
Within a normalization factor, P(p) is nothing but the count probability Pjy(^) (hereafter CPDF), 
defined as the probability of having N objects in a cell of size I thrown at random in the sample. 
To estimate T3 and T4, I use the indicator $ = log/5 — (log/5), with p = N/{N) and thus 

<log/5)= X>g(tf/(tf))iV, (10) 

N>1 
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where (N) = ^NP^- The moment of order Q of $ will be estimated by 

= £ [log(N /{N))~ {log p)] Q P N . (11) 

N>1 

But the sums (10) and (11) give a lot of weight to the small values of N, that are contaminated by 
discreteness effects and may thus provide a very bad estimation of the real moment of order Q of 
the variable $. However, if the measured function Pat(^) presents an exponential cut-off at small 
N, the contamination by discreteness will be small. Such a cut-off is expected if the cell size is 
large compared to the mean distance between two objects in an underdense region. Practically, I 
simply imposed the maximum of the CPDF to be at N > 1. But since the discrete nature of the 
CPDF is intrinsically taken into account in the sums (10) and (11), the function really fitted will 
be the measured CPDF and not the PDF of the underlying continuous density field. 

The second and third column of Table 1 give respectively the measured quantities a 2 and <r|. 
The variance of the field $, is, as expected (see Appendix), equal to a 2 in the weakly non-linear 
limit. In the non-linear regime, <r| increases in a way much slower than a 2 and is till only of order 
unity when a 2 ~ 10, in agreement with the discussion of the end of previous section. 

Figure 1 gives the measured "mass" distribution function P(M,l) as a function of M, compared 
with the SLNDFAi, k = 1,2,3. To have the same mass scales in CDM1 and CDM2, I take M = 
(L{/ Li) 3 N , where N is the number of objects measured in a cell thrown at random in CDMi. 
The variance of the PDF measured at each scale, increases with ma,x[P(M ,1)] and crosses unity 
between the third and the fourth curve. 

We see that the LNDF (k, = 1), although a first order approximation, is already a good fit, 
particularly in the weakly non-linear regime, where it is impossible to distinguish it from the 
measurement, as noticed earlier by KBGND. But this property is certainly a particular feature 
of the CDM matter distribution. Indeed, a careful examination of equation (2) shows that the 
quantities cr^T^ and o\T/± in some way estimate the low-order corrections to the LNDF needed 
to fit the measured PDF. Equations (13) and (14) of Appendix say then that if, in the weakly 
non-linear regime, S3 ~ 3 and S4 ~ 16, then T3 and T4 should be small and hence cr^T^ and o\T/± 
even smaller. Their measured values are given in the fourth and third columns of Table 1. The 
evaluation of S3 in the regime a 2 <C 1 can be done by using second order perturbation theory (e.g., 
Juszkiewicz et al. 1993). The result is given in the 8 th column of Table 1 [5J ea ] and we indeed 
have S3 ~ 3 in this regime. 

When a 2 £ 1, the LNDF does not fit very well the global shape of the measured CPDF, 
but the second order approximation SLNDF2 is much better and the third order one SLNDF3 is 
almost a perfect fit (if one forgets the low M part of the curves, that is anyway contaminated by 
discreteness effects). The reason for that is that the corrective terms <r$T3 and b\T/± are not, in this 
regime, a small fraction of unity. We can however say that the PDF measured in the CDM matter 
distribution does not deviate "very much" from a LNDF because we still have (o^T^I Ss | o - ^ T3 1 Sa 1. 
This last inequality also explains why the SLDNFA: is a good approximation, since the amount of 
the correction seems to decrease with increasing k. 

The last six columns of Table 1. give the measured values of the skewness in various approxi- 
mations. The 6 th column gives the direct measurement of S3 in CDM1 and CDM2, the next one 
gives the more realistic values S^° T of the skewness obtained when correcting the measured CPDF 
for finite volume effects by using the method explained in CBS. The three last columns give the 
measured S^fc, k = 1,2,3. The LNDF is only a first order approximation (k, = 1) and should not 
give the appropriate value of S3, even in the weakly non-linear regime (see Appendix). But in the 
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Figure 1: The PDF as measured in the two samples CDM1 and CDM2 (solid curves) compared to the 
"skewed" lognormal approximation SLNDF&, k = 1 (dots in left panel), k = 2 (shord dashes in middle 
panel) and k = 3 (long dashes in right panel). In each panel, the three lefts curves correspond to the 
measurement on CDM1 and the three right curves to the measurement on CDM2. The quantity represented 
is the "mass" PDF P(M,£) as a function of M for various scales (see text and Table 1) in logarithmic 
coordinates. The fit is given only for the available values of M (e.g., for which the measured P(M,£) is not 
vanishing), which explains the possible interruption of the curves at low M and at high M. The Edgeworth 
expansion, when directly applied to the PDF, would provide a good fit (improving with increasing order k) 
only in the weakly non-linear regime. 

case of the CDM model, it provides as discussed above a reasonable approximation of S3. The 
measured £3,2 and £3,3 give, as expected, a rather good estimate of S3 in the weakly non-linear 
regime. In the non-linear regime, 5^2 tends to underestimate the value of £3 derived from the 
measured CPDF, whereas £3^ tends to overestimate it. Note that the measured numbers S^fc, 
k = 2, 3 are much closer to the real value of S3 (corrected for finite volume effects, e.g., S™*) than 
to the direct measurement (e.g., 53). Therefore, when both used, they provide (in this particular 
example) good estimators of the skewness. 

4 Conclusion 

In this letter, I have proposed a new approximation to fit the probability distribution function 
(PDF) of the large scale density field p, which I call the "skewed" lognormal approximation 
(SLNDF). It consists in applying the Edgeworth expansion (e.g., JWACB) to the PDF of the 
statistical quantity $ = log/) — (log/)). This idea is motivated by writing the continuity equation in 
a Lagrangian way. The SLNDF has been tested on the matter distribution of particles from two cold 
dark matter (CDM) simulations, one of large physical size and one of small physical size to cover a 
large dynamical range. It is seen through this example to provide a very good fit to the measured 
PDF, even in the non-linear regime. This approximation should not only be valid in the particular 
case of the matter distribution in a CDM universe. A more general study is in progress, concerning 
the measurement of the CPDF in scale invariant simulations [(|£fc| 2 ) oc k n , with n = 1,0,-1,-2] 
and the preliminary results indeed confirm the efficiency of the approximation (Colombi, Bouchet 
& Hernquist 1994). 

The SLNDF is not positive definite, so it is not a real PDF and it has to be used with caution. 
However, when used at the appropriate order, it provides a new estimator of the skewness S3 of 
the PDF. It can be used to study the behaviour of the PDF in both the weakly non-linear regime 
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and the non-linear regime. It may thus permit a better understanding of the transition toward the 
non-linear regime. 

SC thanks F.R. Bouchet, J. A. Frieman and A. Stebbins for useful comments. SC is supported 
by DOE and by NASA through grant NAGW-2381 at Fermilab. Part of this work was done while 
SC was at the Institut d'Astrophysique de Paris (CNRS), supported by Ecole Polytechnique. 

Appendix : values of the variance, the skewness and the kurtosis 
of the SLNDFfc in the weakly non-linear regime 

In the weakly non-linear limit a 2 <C 1, we have, with the notation of Sect. 2, 



with O n = 0(<7 n ). Thus, in the weakly non-linear regime, the SLNDF1 has the same variance than 
the PDF, but its skewness and its kurtosis are fixed, S^i = 3, S^i = 16. On the contrary, as far as 
the real PDF is concerned, S3 and S4 should depend on initial conditions (e.g. Juszkiewicz, Bouchet 
& Colombi 1993, Bernardeau 1994 and references herein). To have the appropriate skewness, the 
second-order correction is needed, which fixes T3. The third order approximation only would give 
the true kurtosis in the weakly non-linear regime, which fixes T4. Note that the SLNDF3 could 
provide a way to understand the transition toward the non-linear regime on the skewness through 
second line of equation (14). 
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